ilCOS Rec'ci PCT/PTO 2 6 APR 2001 



FORM PTO-1390 (Modified) 
(REV 1 1-98) 



U.S. DEPARTMENT OF COMMERCE PATENT AND TRADEMARK OFFICE 

TRANSMITTAL LETTER TO THE UNITED STATES 
DESIGNATED/ELECTED OFFICE (DO/EO/US) 
CONCERNING A FILING UNDER 35 U.S.C. 371 



ATTORNEY'S DOCKET NUMBER 
112740-213 



U.S. APPLICATION NO. (IF KNOWN, SEE 37 CFR 

09/830497 



INTERNATIONAL APPLICATION NO. 
PCT/DE00/02917 



INTERNATIONAL FILING DATE 
August 25, 2000 



PRIORITY DATE CLAIMED 

August 26, 1999 



TITLE OF INVENTION 
METHOD FOR TRAINING A SPEAKER RECOGNITION SYSTEM 



APPLICANT(S) FOR DO/EO/US 
Marcin KUROPATWINSKI 



Applicant herewith submits to the United States Designated/Elected Office (DO/EO/US) the following items and other information: 



1. 
2. 
3. 

4. 
5. 



9. 
10. 
11. 
12. 



13 This is a FIRST submission of items concerning a filing under 35 U.S.C. 371. 

□ This is a SECOND or SUBSEQUENT submission of items concerning a filing under 35 U.S.C. 371 . 

Kl This is an express request to begin national examination procedures (35 U.S.C. 371(f)) at any time rather than delay 
examination until the expiration of the applicable time limit set in 35 U.S.C. 371(b) and PCT Articles 22 and 39(1). 

□ A proper Demand for International Preliminary Examination was made by the 19th month from the earliest claimed priority date. 
£3 A copy of the International Application as filed (35 U.S.C. 371 (c) (2)) 

a. Kl is transmitted herewith (required only if not transmitted by the International Bureau). 

b. □ has been transmitted by the International Bureau. 

c. □ is not required, as the application was filed in the United States Receiving Office (RO/US). 

□ A translation of the International Application into English (35 U.S.C. 371(c)(2)). 
13 A copy of the International Search Report (PCT/IS A/2 1 0). 

Kt Amendments to the claims of the International Application under PCT Article 1 9 (35 U.S.C. 371 (c)(3)) 

a. □ are transmitted herewith (required only if not transmitted by the International Bureau). 

b. □ have been transmitted by the International Bureau. 

c. □ have not been made; however, the time limit for making such amendments has NOT expired. 

d. Kl have not been made and will not be made. 

□ A translation of the amendments to the claims under PCT Article 19 (35 U.S.C. 371(c)(3)). 
£3 An oath or declaration of the inventor(s) (35 U.S.C. 371 (c)(4)). 

□ A copy of the International Preliminary Examination Report (PCT/IPEA/409) 

□ A translation of the annexes to the International Preliminary Examination Report under PCT Article 36 
(35 U.S.C. 371 (c)(5)). 



Items 13 to 20 below concern document(s) or information included: 

13. □ An Information Disclosure Statement under 37 CFR 1.97 and 1.98. 

14. □ An assignment document for recording. A separate cover sheet in compliance with 37 CFR 3.28 and 3.3 1 is included. 

15. □ A FIRST preliminary amendment 

16. □ A SECOND or SUBSEQUENT preliminary amendment. 

17. □ A substitute specification. 

18. □ A change of power of attorney and/or address letter. 

19. IS Certificate of Mailing by Express Mail 

20. 3 Other items or information: 



Return Receipt Postcard 



Page 1 of 2 



PCTUS1/REV03 



JC18Rec'd PCT/PTQ 2 6 APR 2001 



U.S. APPLICAT] 



INTERNATIONAL APPLICATION NO 
PCT/DE00/02917 



ATTORNEY'S DOCKET NUMBER 
112740-213 



21 . The following fees are submitted:. 

BASIC NATIONAL FEE ( 37 CFR 1.492 (a) (1) - (5)) : 

Neither international preliminary examination fee (37 CFR 1.482) nor 
international search fee (37 CFR 1.445(a)(2) paid to USPTO 

and International Search Report not prepared by the EPO or JPO $1,000.00 

International preliminary examination fee (37 CFR 1.482) not paid to 
USPTO but Internation Search Report prepared by the EPO or JPO 

International preliminary examination fee (37 CFR 1.482) not paid to USPTO 
but international search fee (37 CFR 1.445(a)(2)) paid to USPTO 

International preliminary examination fee paid to USPTO (37 CFR L482) 
but all claims did not satisfy provisions of PCT Article 33(1 )-(4) 



□ 

□ 
□ 



$860.00 



$710.00 



$690.00 



□ 



International preliminary examination fee paid to USPTO (37 CFR 1.482) 
and all claims satisfied provisions of PCT Article 33(l)-(4) 



$100.00 

ENTER APPROPRIATE BASIC FEE AMOUNT = 



CALCULATIONS PTO USE ONLY 



$860.00 



Surcharge of $130.00 for furnishing the oath or declaration later than 
months from the earliest claimed priority date (37 CFR 1.492 (e)). 



□ 20 



□ 30 



$0.00 



CLAIMS 



NUMBER FILED 



NUMBER EXTRA 



RATE 



Total claims 



■20 = 



0 



x $18.00 



$0.00 



Independent claims 



3 - 



$80.00 



$0.00 



Multiple Dependent Claims (check if applicable). 



□ 



$0.00 



TOTAL OF ABOVE CALCULATIONS 



$860.00 



Reduction of 1/2 for filing by small entity, if applicable. Verified Small Entity Statement 
bust also be filed (Note 37 CFR 1.9, 1.27, 1.28) (check if applicable). 



□ 



$0.00 



SUBTOTAL = 



$860.00 



Processing fee of $130.00 for furnishing the English translation later than 
months from the earliest claimed priority date (37 CFR 1 .492 (f)). 



□ 20 



□ 30 



$0.00 



TOTAL NATIONAL FEE = 



$860.00 



pee for recording the enclosed assignment (37 CFR 1 -21(h)). The assignment must be 
Accompanied by an appropriate cover sheet (37 CFR 3.28, 3.31) (check if applicable). 



□ 



$0.00 



TOTAL FEES ENCLOSED = 



$860.00 



Amount to be: 
refunded 



charged 



IS A check in the amount of $860.00 



□ 



Please charge my Deposit Account No. 
A duplicate copy of this sheet is enclosed. 



to cover the above fees is enclosed. 



in the amount of 



to cover the above fees. 



The Commissioner is hereby authorized to charge any fees which may be required, or credit any overpayment 
to Deposit Account No. 02-1 81 8 A duplicate copy of this sheet is enclosed. 



NOTE: Where an appropriate time limit under 37 CFR 1.494 or 1.495 has not been met, a petition to rjvive (37 CFR 
1.137(a) or (b)) must be filed and granted to restore the application to pending status. 



SEND ALL CORRESPONDENCE TO: 



William E. Vaughan, Esq. 
Bell, Boyd & Lloyd LLC 
P.O. Box 1135 

Chicago, Illinois 60690-1135 
Tel: (312) 807-4292 



SIGNATURE 
William E. Vaughan 




NAME 
39,056 



REGISTRATION NUMBER 

April 26, 2001 

DATE 



Page 2 of 2 



Rec'd mmo 27 juh 20 



BOX PCT 



IN THE UNITED STATES ELECTED/DESIGNATED OFFICE 
OF THE UNITED STATES PATENT AND TRADEMARK OFFICE 
UNDER THE PATENT COOPERATION TREATY-CHAPTER I 



PRELIMINARY AMENDMENT 



APPLICANT: 



Marcin Kuropatwinski 



DOCKET NO: 112740-213 



SERIAL NO: 



09/830,497 



GROUP ART UNIT: 



EXAMINER: 



INTERNATIONAL APPLICATION NO: 



PCT/DE00/02917 



INTERNATIONAL FILING DATE: 



25 August 2000 



INVENTION: METHOD FOR THE IDENTIFICATION OF SPEAKERS ON 
THE BASIS OF THEIR VOICES 

Assistant Commissioner for Patents, 
Washington, D.C. 20231 



Please amend the above-identified International Application before entry into 
the National stage before the U.S. Patent and Trademark Office under 35 U.S.C. §371 
as follows: 
In the Specification: 

Please replace the Specification of the present application, including the 
Abstract, with the following Substitute Specification: 



METHOD FOR THE IDENTIFICATION OF SPEAKERS ON THE BASIS 



Sir: 



SPECIFICATION 



TITLE 



OF THEIR VOICES 



BACKGROUND OF THE INVENTION 



Field of the Invention 



The present invention generally relates to a method for the identification of 
speakers on the basis of their voices, wherein the method is robust, safe, secure and 
reliable. 

Description of the Prior Art 

5 The problem of speaker identification is to distinguish between different 

speakers or to check the predetermined speaker identity, with the only input 
information being the recording of the voice of the speaker. 

Problems have developed relating to the access system being outwitted 
when the voice and the keyword are recorded by third parties. 
10 Moreover, when complex probability distributions for the speech 

parameters of a speaker are stored, a compromise must be made between accuracy 
and memory requirement. For this reason, methods for storage of the probability 
distributions have been proposed which can be used as a function of the number of 
speakers. 

15 Until now, the speaker has been identified, for example, with the aid of 

hidden Markov models or by vector quantization, see reference [1], 

The speech signal parameters used in the past, such as Cepstral AR 
parameters, do not provide a satisfactory solution to speaker identification 
problems. For this reason, other parameters need to be used, such as parameters 

20 relating to the excitation of the vocal tract, which include information that is 

dependent on the speaker and is at the same time largely phoneme-independent. 

SUMMARY OF THE INVENTION 
In light of the above, the present invention provides a method for estimation 
of the probability distribution of the coder parameters for the respective speaker, as 

25 well as a method which prevents the access system from being outwitted. In 

addition, the present invention solves the problem of speaker identification based 
on the parameters of an analysis via synthesis coders using linear prediction 
(LP AS) [1] (for example, a harmonic vector excited codec [5] or waveform 
interpolation codec [4]. 
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Accordingly, the present invention is first directed to a method for 
identifying speakers on the basis of their voices, wherein the method includes a 
preparation face, a simulated usage phase of the training phase, and a usage phase. 
In the preparation phase, the method includes the steps of: segmenting into first 
5 speech signal frames of a given length, a number of text-dependent or text- 
independent reference spoken expressions, from a number of speakers, which form 
a speaker-related training statement; supplying the first speech signal frames to an 
analysis-by-synthesis coder based on linear predictions; calculating a first short- 
term predictor parameter, a first long-term predictor parameter and/or an excitation 

10 parameter for the coder, in the analysis-by-synthesis coder for each of the number 
of speakers and for each first speech signal frame, wherein the parameters form 
speaker-related training material; calculating the frequency of the respective 
occurrence of the first parameters in the speaker-related training statement and/or 
the probability densities with which the first parameters are contained in the 

15 speaker-related training statement, in the analysis-by-synthesis coder for each of the 
number of speakers and for each first speech signal frame; and storing the 
calculated frequencies and/or the probability densities on a speaker-related basis as 
speaker data. In the simulated usage phase the method includes the steps of: 
segmenting into second speech signal frames of a given length L a text-dependent 

20 or a text-independent simulation spoken expression of a given speaker; supplying 
the second speech signal frames to the analysis-by-synthesis coder; calculating a 
second short-term predictor parameter, a second long-term predictor parameter 
and/or a second excitation parameter for the coder, the calculation being performed 
in the analysis-by-synthesis coder for the given speaker and for every other speech 

25 signal frame in each case; calculating first probability hits for every other speech 
signal frame from the calculated second parameters and the speaker data stored for 
the given speaker in the preparation phase, the probability hits indicating a 
probability with which the second parameters match the first parameters; 
combining the first probability scores from all the second speech signal frames; and 

30 checking to determine whether the combined first probability scores are greater 
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than a predetermined first threshold which confirms the voice of the given speaker, 
when the combined first probability scores are greater than the predetermined first 
threshold, the voice of the given speaker is confirmed, and when the combined first 
probability scores are less than or equal to the predetermined first threshold, the 
5 preparation phase continues for further reference spoken expressions by the given 
speaker until the voice of the given speaker is confirmed. In the usage phase, the 
method includes the steps of: segmenting a text-dependent or a text-independent 
used spoken expression of the given speaker intojhird speech signal frames of a 
given length; supplying the third speech signal frames to the analysis-by-synthesis 

10 coder; calculating a third short-term predictor parameter, a third long-term predictor 
parameter and/or a third excitation parameter for the coder, in the analysis-by- 
synthesis coder for the given speaker and for every third speech signal frame in 
each case; calculating second probability hits for every third speech signal frame 
from the calculated third parameters and the speaker data stored for the given 

1 5 speaker in the preparation phase, the second probability hits indicating a probability 
with which the third parameters have been_spoken by the given speaker; combining 
the second probability hits from all the third speech signal frames; and checking to 
determine whether the combined second probability scores are greater than a 
predetermined second threshold and the voice of the given speaker, when the 

20 combined second probability hits are greater than the predetermined second 

threshold, the voice of the given speaker is identified, and where the combined 
second probability scores are less than or equal to the predetermined second 
threshold, the voice of the given speaker is not identified. 

In an embodiment of the method, a harmonic vector excited predictive coder 

25 and a waveform interpolating coder is used as a parametric coder. 

In an embodiment of the method, an LP AS coder is used as the analysis-by- 
synthesis coder. 

In an embodiment, the method further includes the step of quantizing the 
frequencies and/or the probability densities using a vector quantizer having a 
30 specific and considerably reduced number of bits. 
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In an embodiment, the method further includes the step of entering noise 
which is known to the speaker identification system when the spoken expression of 
a speaker is entered into the speaker identification system. 

In an embodiment, the method further includes the step of subtracting the 
entered noise internally, before the segmentation, from the recording of the 
speakers voice. 

Additional features and advantages of the present invention are described in, 
and will be apparent from, the following Detailed Description of the Preferred 
Embodiments and the Drawings. 

DESCRIPTION OF THE DRAWINGS 

Figure 1 shows, in generally diagrammatic form, the object of speaker 
identification; 

Figure 2 is a general flowchart of the basic considerations in speaker 
verification; 

Figure 3 schematically illustrates the synthesis model of a CELP coder; 
Figure 4 shows a schematic illustration of the various parameter groups of 
an LPAS coder; 

Figure 5 shows a block diagram form speaker identification using the 
parameters of an LPAS coder; 

Figure 6 shows a block diagram form speaker identification using the 
parameters of an LPAS coder, wherein probability densities are stored together with 
the code vectors for the parameters; 

Figure 7 shows in detailed flowchart form speaker verification using the 
parameters of an LPAS coder; and 

Figures 8a-8m show in flowchart form an exemplary embodiment of all 
phases of the method of the present invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Speaker identification 

In systems for speaker identification, statistical principles [2] are used to 
check whether the spoken sentence has been spoken by one of the speakers covered 



by the speaker identification system. In the process, there are, in principle, two 
types of speaker identification systems; text-dependent systems and text- 
independent systems. For the procedure described in the present invention, text 
independence of the system is achieved via an expanded training phase in which the 
5 speaker has to record a wide range of material, and the probability distributions of 
the speech signal parameters are established from all the spoken material. A text- 
dependent system can be trained more easily since the spoken material which is 
spoken by the speaker during the usage phase is limited to a number of keywords or 
specific sentences. The preparation phase is continued until the system reliably 

10 identifies the voice of the speaker. An exemplary embodiment of all phases of the 
method of the present invention is summarily depicted and described in Figures 8a- 
8m, to which the entirety of the following disclosure is associated. 

The object of speaker identification is illustrated in Figure 1 (Problem of 
speaker identification). 

1 5 Speaker identification is dealt with as a problem relating to the detection of 

multiples [2]. The classes to be distinguished between, one for each speaker who is 
. intended to be identified by the system, are referred to as spj = 1 ..M, where Mis the 
number of speakers covered by the speaker identification system. Speaker 
identification is based on recorded signals spoken by the respective speaker. The 

20 speech signal is segmented into the signal frames x = [x(l)..x(K)] (K^ 160 for a 
signal frame with a length of 20 ms and a sampling frequency of 8 kHz, for 
example). The segmentation process produces the speech signal frames x(l)..x(7V) ? 
where TV depends on the total length of the sentence or keyword spoken by the 
speaker. The decision on the speaker is made from the probabilities or probability 

25 densities (referred to jointly as probability scores) that the vectors of the samples 
x(/) / = 1.. N belong to the class spf. The statistically optimum decision scheme 
selects that class spj having the highest probability value for given x(/) ? / = 1 . JV. As 
such, the vector x(/) is assigned to the class spj for which: 
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p(x(l)...x(/V) | spj) > p(x(l)...x(W) I spf) for all / * / 

Speaker verification 

The problem of speaker verification is to check the predetermined identity 
5 of the speaker on the basis of his/her voice. This corresponds to the situation 
illustrated in Figure 2 (Problem of speaker verification). 

The process of speaker verification is carried out in a similar manner to that 
of speaker identification, wherein the spoken sentence is likewise segmented. 
However, after this, the voice is not classified, but a probability score is calculated 
10 for the predetermined speaker identity and is compared with a threshold. The 
identity of the speaker is thus confirmed on the basis of his/her voice when: 

P(X(1M(/V) | spj) > threshold 

where spj corresponds to the predetermined speaker identity. The threshold must be 
set sufficiently high to avoid the situation in which a speaker with a different 
15 identity to that predetermined is accepted/authorized. 
. LPAS coder 

The speech coding methods used nowadays are predominantly based on the 
analysis by synthesis method using an LPC synthesis filter [2]. In these methods, 
speech coding is optimized by repetition of the coding and decoding operations 

20 until the optimum parameter set is found for the given speech section. 

One of the most widely used types of LPAS coder is the CELP coder. One 
relatively new development is the harmonic vector excited codec, where the form 
of the excitation signals is particularly suitable for the described task. Figure 3 
(Design of an LPAS Coder) illustrates the synthesis model of a CELP coder. The 

25 synthesis model defines the method for calculating the synthesized speech signal 
from the quantized parameters of the speech signal. In general, each LPAS coder 
has the following parameter groups (see also Figure 4): 
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• Short-term predictor parameters. The short-term predictor parameters are 
generally calculated via classical LPC analysis, using the correlation method or 
the covariance method for linear prediction [3], 8-10 LPC coefficients are used 
for signal frames with a length of 20 to 30 ms and a 

5 sampling rate of 8 kHz. The short-term predictor parameters may occur in 

various forms (for example the reflection coefficients or in the form of line 
spectrum frequencies LSF), depending on the representation which can be 

• quantized better. It has been found that the LSF coefficients are most suitable for 
quantization, and this form of prediction coefficients is generally used. The 

1 0 short-term predictor parameters are calculated using an open-loop procedure, 
that is to say without the overall optimization, illustrated in Figure 1, with the 
other parameters relating to the synthesis error. 

• Long-term predictor parameters. Long-term predictor parameters are used in a 
filter which synthesizes the fundamental frequency of the speech signal. This is 

1 5 generally a long-term predictor with a filter coefficient and a parameter for the 
fundamental period of the voice signal. A long-term predictor with the 
parameters b = [b,N] is a part of Figure 2. The long-term predictor parameters 
are likewise calculated using an open-loop procedure, without overall 
optimization with the other parameters. In some coders, a refined search is 

20 sometimes carried out based on the long-term predictor parameters using a 
closed-loop procedure. 

• The excitation parameters. The 5-10 ms subframes of the remaining signal are 
vector-quantized using a closed-loop procedure in a CELP coder. The 
transmitted parameters allow the signal forms to be reproduced at the decoder 

25 end from the stored codebook. 

In an HVXC codec, the output from the LPC analysis filter is transformed 
to the frequency domain and the fundamental-period-normalized spectral envelope 
is vector-quantized. 

Speaker identification using the parameters of an LPAS coder 



8 



The speech coder parameters provide a comprehensive description of the 
possible speech signals using considerably fewer parameters than when the speech 
signal is represented as a sequence of samples. 

The decomposition of the speech signal into parameter groups can be used 
5 in various ways for speaker identification. The methods for calculation of the 

parameters and synthesis of the speech signal imply probability density estimation 
methods (for example the probabilities of the parameters, which are regarded as 
discrete probability variables). Those defined using a closed-loop procedure should 
actually be regarded as discrete probability variables, since it is impossible to link 
1 0 the volumes of the parameter space regions of the vector quantizer for parameters 
such as these. This relates in particular to the excitation parameters. The probability 
distributions for such parameters are estimated by calculating relative frequencies 
of the parameters/code vectors in the training statement. 

Those which are calculated using an open-loop, procedure in the coder are 
1 5 initially available in a non-quantized form and are quantized only after this, with 
vector quantization generally being used. For parameters such as these, the 
probability densities can be estimated from the training statement. This approach is 
used primarily for the short-term predictor parameters. 

The probability density estimation is based on the histogram method [6]. 
20 This method requires knowledge of the volumes of those regions of the parameter 
space which are linked to the quantized points. 

A method for storage of probability distributions is obtained, according to 
Figure 5 (Speaker identification using the parameters of an LPAS coder), if the 
possible code vectors for the speech signal parameters are stored once for the entire 
25 population, which corresponds to the situation where the quantization steps/code 
vectors are determined once, from the database which contains the recordings by a 
large number of speakers. The probability distributions of the parameters for the 
speakers are then stored together with the indices of the code vectors for the 
parameters in the system. This is suitable for large systems with a very large 
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number of users (ATM, access systems in companies). In this regard, see all Figure 
7. 

Another method is for the code vectors for the parameters for each speaker 
to be trained individually. The code vectors are then stored together with the values 
5 of the probability densities at those parameter space points defined by the code 
vectors. One possible way of carrying out this method is shown in Figure 6 
(speaker identification using the parameters of an LPAS coder, probability densities 
are stored together with the code vectors for the parameters). This method is 
intended for a small number of speakers (for example, for a voice-controlled door 

10 in a dwelling). 

Training phase of a speaker identification system 

The probability density distributions for the speaker classes are estimated 
from the training material. For text-dependent speaker identification (speaker 
identification/speaker verification), a specific sentence or keyword is repeated 

15 during the training phase until the speaker identification operates reliably. 

Phonetically balanced spoken material must be recorded for text- 
independent speaker verification. In this case as well, the training phase must be 
repeated until the speaker identification/verification operates reliably. 

The material recorded during the training phase is in each case used with a 

20 phase shift a number of times for training, in order to make the speaker 

identification system independent of the initial phase of the recorded voices. The 
data used for training are referred to as the training statement TS sp . 9 with spi 
symbolizing the speaker. 
Estimation of the probability densities 

25 In order to describe the method according to the present invention for 

estimation of the probability densities of the parameters for the speaker classes, a 
number of necessary definitions first will be introduced. 
The introduced abstraction of the coding process has the advantage that the 
estimation of the probability densities can be described in a simple manner without 

30 needing to go into the details of the highly complicated operations in the speech 
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coder. A detailed description of the parameter calculation process can be found in 
[4] and [5]. 

A speech coder operates in evaluation intervals. The operations described in 
that section via the LPAS coder are carried out in the speech coder for each signal 
5 frame, and supply the parameters of the speech signal for the respective frame. 

Calculation of a non-quantized parameter vector p from the signal frame x is 
written as p = Kp(x) in an open-loop optimization procedure. The quantization of 
the parameter is referred to as p=Q p {p). That region in the parameter space of the 
■ parameter p which is mapped onto the code vector p in the coding process is 
10 referred to as S p ={p : Q p (p)=p}. The volume of this region is referred to as 

The set of possible code vectors for the parameter p is written as 
C p ={p ( ;/=l.JV p ^ 9 where is the number of code vectors. The set or regions 
which are linked to the code vectors is referred to as R p ={s ± ; i= 1 . .N }. The 
1 5 association function for a region Sj is referred to as: 

The frequency of occurrence of a parameter in the training statement is 
• calculated using 

20 
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Number of parameter values from the training statement TS S p. which occur in the region Sj 
Number of parameter values from the training statement TSsp; 

The estimated probability density distribution then becomes: 



N a f 



p(p|sp,) = ]Tl Sj( (p) 



Estimation of the probabilities 
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The probability functions (probability mass functions) are estimated for 
those parameters which are regarded as discrete probability variables, that is to say 
in particular the excitation from the codebook, which is optimized using a closed- 
loop procedure, and the fundamental period of the speech signal. These probability 
5 functions are defined as the frequencies of the given parameter codes in the training 
statement for the respective speaker. 
Storage of the probability distributions 

The speech parameters are not all calculated at the same time, but 
successively, in a speech coder. For example, the short-term predictor parameters 
10' are calculated first, and the remaining parameters for already known short-term 
predictor parameters are then optimized with regard to the synthesis or the 
prediction error. This allows effective storage of the probability distributions as 
conditional probabilities of the code vectors in a tree structure. This is possible due 
to the following relationship: 



P(Pk*PuP* ($P/) = P(Pk \sPi)P(PL !*PmPk)P(P* l s P/.P>oPt) 



A major simplification can be achieved if the speech parameters within a 
signal frame can be assumed to be statistically independent. The above formula 
then becomes: 



15 



pa 



Pk - 



PL 



Vector for a short-term parameter 
Vector for a long-term parameter 
Vector for an excitation parameter 
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P(P 



K V U P A | sp,.) = p(p K I sp,)p(p L I sp,)p(p A I sp) 



25 
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The probability densities need to be stored at a very large number of points 
in parameter space in the system. The number of bits used for storing probability 
densities is critical to the complexity of the overall system. A vector quantizer is 
therefore used for the probability values. This makes it possible to reduce the 
5 number of bits used for storing the probability distributions. 
System safety and security 

In order to prevent the system from being outwitted, noise is transmitted at 
the same time that the voice of the speaker is being recorded, which noise is known 
to the system, and from which the digitized speech signal is subtracted. 
1 0 The present invention can be used for access control applications, such as 

voice-controlled doors, or for verification; for example, for bank access systems. 
The procedure can be implemented as a program module on a processor which 
carries out the task of speaker identification in the system. 

Although the present invention has been described with reference to specific 
1 5 embodiments, those of skill in the art will recognize that changes may be made 
thereto without departing from the spirit and scope of the invention as set forth in 
the hereafter appended claims. 

[1] S. Furui, "Recent advances in speaker recognition", Pattern Recognition 
Letters, Tokyo Inst, of Technol., 1997 
20 [2] P. Vary, U. Heute, W. Hess, Digitale Sprachsignalverarbeitung [Digital 
speech signal processing], B.G. Teubner, Stuttgart, 1998 
[3] K. Kroschel, Statistische Nachrichtentheorie [Statistical information 

theory], 3rd ed., Springer- Verlag, 1997 
[4] W.B. Kleijn, K.K. Paliwal, Speech Coding and Synthesis, Elsevier, 1995 
25 [5] ISO/IEC 14496-3, MPGA-3 HVXC Speech Coder description 
[6] Prakasa Rao, Functional Estimation, Academic Press, 1982 
ABSTRACT OF THE DISCLOSURE 
A method for speaker identification using parameters of an LPAS coder or 
of a parametric coder for modeling the probability distribution for the speaker 
30 classes. 
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In the claims : 

On page 12, cancel line 1, and substitute the following left-hand justified 
heading therefor: 
I Claim as My Invention : 
5 Please cancel claims 1-6, without prejudice, and substitute the following 

claims therefor: 

7. A method for the identification of speakers on the basis of the 
speakers' respective voices, the method comprising the steps of: 

segmenting, in a preparation phase, into first speech signal frames of a 
10 given length, a plurality of one of text-dependent and text-independent reference 
spoken expressions, from a plurality of speakers, which form a speaker-related 
training statement; 

supplying the first speech signal frames, in the preparation phase, to an 
analysis-by-synthesis coder based on linear predictions; 
1 5 calculating, in the preparation phase, at least one of a first short-term 

predictor parameter, a first long-term predictor parameter and a first excitation 
parameter for the coder in the analysis-by-synthesis coder for each of the plurality 
of speakers and for each first speech signal frame in each case, wherein the 
parameters form speaker-related training material; 
20 calculating, in the preparation phase, at least one of a frequency of a 

respective occurrence of the first parameters in the speaker-related training 
statement and probability densities with which the first parameters are contained in 
the speaker-related training statement, the calculation being performed in the 
• analysis-by-synthesis coder for each of the plurality of speakers and for each first 
25 speech signal frame in each case; 

storing, in the preparation phase, at least one of the calculating frequencies 
and the probability densities on a speaker-related basis as speaker data; 

segmenting, in a simulated usage phase of the training phase, into second 
speech signal frames of a given length, one of a text-dependent and a text- 
30 independent simulation spoken expression of a given speaker; 
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supplying, in the simulated usage phase, the second speech signal frames to 
the signal-by-synthesis coder; 

calculating, in the simulated usage phase, at least one of a second short-term 
predictor parameter, a second long-term predictor parameter and a second 
5 excitation parameter for the coder, the calculation being performed in the analysis- 
by-synthesis coder for the given speaker and for every other speech signal frame in 
each case; 

calculating, in the simulated usage phase, first probability hits for every 
other speech signal frame from the calculated second parameters and the speaker 
10 data stored for the given speaker in the preparation phase, the probability hits 
indicating a probability with which the second parameters match the first 
parameters; 

combining, in the simulated usage phase, the first probability scores from 
all the second speech signal frames; 
15 checking, in the simulated usage phase, to determine whether the combined 

first probability scores are greater than a predetermined first threshold which 
confirms the voice of the given speaker, when the combined first probability scores 
are greater than the predetermined first threshold, the voice of the given speaker is 
confirmed, and when the combined first probability scores are less than or equal to 
20 the predetermined first threshold, the preparation phase continues for further 
reference spoken expressions by the given speaker until the voice of the given 
speaker is confirmed; 

segmenting into third speech signal frames of a given length, in a usage 
phase, one of a text-dependent and a text-independent used spoken expression of 
25 ' the given speaker; 

supplying, in the usage phase, the third speech signal frames to be analysis- 
by-synthesis coder; 

calculating, in the usage phase, at least one of a third short-term predictor 
parameter, a third long-term predictor parameter and a third excitation parameter 
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for the coder, the calculation being performed in the analysis-by-synthesis coder for 
the given speaker and for every third speech signal frame in each case; 

calculating, in the usage phase, second probability hits for every third 
speech signal frame from the calculated third parameters and the speaker data 
5 stored for the given speaker in the preparation phase, the second probability hits 
indicating a probability with which the third parameters have been spoken by the 
given speaker; 

combining, in the usage phase, the second probability hits from all the third 
speech signal frames; and 

10 checking, in the usage phase, to determine whether the combined second 

probability scores are greater than a predetermined second threshold which 
identifies the voice of the given speaker, when the combined second probability hits 
are greater than the predetermined second threshold, the voice of the given speaker 
is identified, and when the combined second probability scores are less than or 

15 equal to the predetermined second threshold, the voice of the given speaker is not 
identified. 

8. A method for the identification of speakers on the basis of the 
speakers respective voices as claimed in claim 7, wherein one of a harmonic vector 

20 excited predictive coder and a waveform interpolating coder is used as a parametric 
coder. 

9. A method for the identification of speakers on the basis of the 
speakers respective voices as claimed in claim 7, wherein an LPAS coder is used as 

25 the analysis-by-synthesis coder. 

10. A method for the identification of speakers on the basis of the 
speakers respective voices as claimed in claim 7, the method further comprising the 
step of: 



16 



quantizing at least one of the frequencies and the probability densities using 
a vector quantizer having a specific and considerably reduced number of bits. 

11. A method for the identification of speakers on the basis of the 
speakers respective voices as claimed in claim 7, the method further comprising the 
step of: 

entering noise which is known to the speaker identification system when the 
spoken expression of a speaker is entered into the speaker identification system. 

12. A method for the identification of speakers on the basis of the 
speakers respective voices as claimed in claim 1 1, the method further comprising 
the step of: 

subtracting the entered noise internally, before the segmentation, from the 
recording of the speakers voice. 

REMARKS 

The present amendment makes editorial changes and corrects typographical 
errors in the specification, which includes the Abstract, in order to conform the 
specification to the requirements of United States Patent Practice. No new matter is 
added thereby. Attached hereto is a marked-up version of the changes made to the 
specification by the present amendment. The attached page is captioned " Version 
With Markings To Show Changes Made". 

In addition, the present amendment cancels original claims 1-6 in favor of 
new claims 7-12. Claims 7-12 have been presented solely because the revisions by 
red-lining and underlining which would have been necessary in claims 1-6 in order 
to present those claims in accordance with preferred United States Patent Practice 
would have been too extensive, and thus would have been too burdensome. The 
present amendment is intended for clarification purposes only and not for 
substantial reasons related to patentability pursuant to 35 USC §§103, 102, 103 or 
112. Indeed, the cancellation of claims 1-6 does not constitute an intent on the part 
of the Applicant to surrender any of the subject matter of claims 1-6. 
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VERSIONS WITH MARKINGS TO SHOW CHANGES MADE 



In The Specification: 

The Specification of the present application, including the Abstract, has been 
amended as follows: 
5 SPECIFICATION 

TITLE 

Method for identification of speakers on the basi s of their voices 
METHOD FOR THE IDENTIFICATION OF SPEAKERS ON THE BASIS 

OF THEIR VOICES 
10 BACKGROUND OF THE INVENTION 

Description 

Field of the Invention 

The invention relates to a method for identification of speakers on the basis 
of their voices. 

15 . The object on which the invention is based is to specify a method for 

identification of speakers on the basis of their voices, which method is robust, safe, 
secure and reliable. 

According to the invention, this object is achieved by the features specified 
in patent claim 1. 

20 The invention will bo described in more detail in the following text using a 

flowchart, 
4t 

The present invention generally relates to a method for the identification of 
speakers on the basis of their voices, wherein the method is robust safe, secure and 
25 reliable. 

Description of the Prior Art 

The invention allows the identification of th e speaker on the basis of his 
voice. The problem of speaker identification is to distinguish between different 
speakers or to check the predetermined speaker identity, with the only input 
30 information being the recording of the voice of the speaker. 
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Furthermore, a method is proposed which prevents Problems have 
developed relating to the access system from being outwitted when the voice and 
the keyword are recorded by third parties. 

When Moreover, when complex probability distributions for the speech 
5 parameters of a speaker are stored, a compromise must be made between accuracy 
and memory requirement. For this reason, methods for storage of the probability 
distributions have been proposed which can be used as a function of the number of 
speakers. 

10 Until now, the speaker has been identified, for example, with the aid of 

hidden Markov models or by vector quantization, see reference [1]. 

The invention solves the problem of speaker identification based on the 
parameters of an analysis by means of synthesis coders using linear prediction 

15 (LPAS) [1] (for example a harmonic vector excited codec [5] or waveform 

interpolation codec [A]. The speech signal parameters used in the past, such as 
Cepstral AR parameters, do not provide a satisfactory solution to the problem 
speaker identification problems . For this reason, other parameters need to be used, 
such as parameters relating to the excitation of the vocal tract, which include 

20 information that is dependent on the speaker and is at the same time largely 
phoneme-independent. 

SUMMARY OF THE INVENTION 

Furthermore, the In light of the above, the present invention provides a 
method for estimation of the probability distribution of the coder parameters for the 
25 respective speaker is given , as well as a method which prevents the access system 
from being outwitted. In addition, the present invention solves the problem of 
speaker identification based on the parameters of an analysis via synthesis coders 
using linear prediction (LPAS) IT] (for example, a harmonic vector excited codec 
[5] or waveform interpolation codec T41. 
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Accordingly, the present invention is first directed to a method for 
identifying speakers on the basis of their voices, wherein the method includes a 
preparation face, a simulated usage phase of the training phase, and a usage phase. 
In the preparation phase, the method includes the steps of: segmenting into first 
5 speech signal frames of a given length, a number of text-dependent or text- 
independent reference spoken expressions, from a number of speakers, which form 
a speaker-related training statement; supplying the first speech signal frames to an 
analysis-by-synthesis coder based on linear predictions; calculating a first short- 
term predictor parameter, a first long-term predictor parameter and/or an excitation 

10 parameter for the coder, in the analysis-by-synthesis coder for each of the number 
of speakers and for each first speech signal frame, wherein the parameters form 
speaker-related training material; calculating the frequency of the respective 
occurrence of the first parameters in the speaker-related training statement and/or 
the probability densities with which the first parameters are contained in the 

15 speaker-related training statement, in the analvsis-bv-svnthesis coder for each of the 
number of speakers and for each first speech signal frame; and storing the 
calculated frequencies and/or the probability densities on a speaker-related basis as 
speaker data. In the simulated usage phase the method includes the steps of: 
segmenting into second speech signal frames of a given length L a text-dependent 

20 or a text-independent simulation spoken expression of a given speaker; supplying 
the second speech signal frames to the analysis-by-synthesis coder; calculating a 
second short-term predictor parameter, a second long-term predictor parameter 
and/or a second excitation parameter for the coder, the calculation being performed 
in the analysis-by-synthesis coder for the given speaker and for every other speech 

25 signal frame in each case; calculating first probability hits for every other speech 
signal frame from the calculated second parameters and the speaker data stored for 
the given speaker in the preparation phase, the probability hits indicating a 
probability with which the second parameters match the first parameters: 
combining the first probability scores from all the second speech signal frames; and 

30 checking to determine whether the combined first probability scores are greater 
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than a predetermined first threshold which confirms the voice of the given speaker, 
when the combined first probability scores are greater than the predetermined first 
threshold, the voice of the given speaker is confirmed, and when the combined first 
probability scores are less than or equal to the predetermined first threshold, the 
5 preparation phase continues for further reference spoken expressions by the given 
speaker until the voice of the given speaker is confirmed. In the usage phase, the 
method includes the steps of: segmenting a text-dependent or a text-independent 
used spoken expression of the given speaker into third speech signal frames of a 
given length; supplying the third speech signal frames to the analvsis-bv-svnthesis 

10 coder; calculating a third short-term predictor parameter, a third long-term predictor 
parameter and/or a third excitation parameter for the coder, in the analvsis-bv- 
synthesis coder for the given speaker and for every third speech signal frame in 
each case; calculating second probability hits for every third speech signal frame 
from the calculated third parameters and the speaker data stored for the given 

15 speaker in the preparation phase, the second probability hits indicating a probability 
with which the third parameters have been spoken by the given speaker; combining 
the second probability hits from all the third speech signal frames; and checking to 
determine whether the combined second probability scores are greater than a 
predetermined second threshold and the voice of the given speaker, when the 

20 combined second probability hits are greater than the predetermined second 

threshold, the voice of the given speaker is identified, and where the combined 
second probability scores are less than or equal to the predetermined second 
threshold, the voice of the given speaker is not identified. 

In an embodiment of the method, a harmonic vector excited predictive coder 

25 and a waveform interpolating coder is used as a parametric coder. 

In an embodiment of the method, an LP AS coder is used as the analvsis-bv- 
svnthesis coder. 

In an embodiment, the method further includes the step of quantizing the 
frequencies and/or the probability densities using a vector quantizer having a 
30 specific and considerably reduced number of bits. 
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In an embodiment the method further includes the step of entering noise 
which is known to the speaker identification system when the spoken expression of 
a speaker is entered into the speaker identification system. 

In an embodiment, the method further includes the step of subtracting the 
5 entered noise internally, before the segmentation, from the recording of the 
speakers voice. 

Additional features and advantages of the present invention are described in, 
and will be apparent from, the following Detailed Description of the Preferred 
Embodiments and the Drawings. 
10 DESCRIPTION OF THE DRAWINGS 

Figure 1 shows, in generally diagrammatic form, the object of speaker 
identification; 

Figure 2 is a general flowchart of the basic considerations in speaker 
verification; 

15 Figure 3 schematically illustrates the synthesis model of a CELP coder; 

Figure 4 shows a schematic illustration of the various parameter groups of 
an LP AS coder; 

Figure 5 shows in block diagram form speaker identification using the 
parameters of an LP AS coder; 
20 * Figure 6 shows in block diagram form speaker identification using the 

parameters of an LPAS coder, wherein probability densities are stored together with 
the code vectors for the parameters; 

Figure 7 shows in detailed flowchart form speaker verification using the 
parameters of an LPAS coder; and 
25 Figures 8a-8m show in flowchart form an exemplary embodiment of all 

phases of the method of the present invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Speaker identification 

In systems for speaker identification, statistical principles [2] are used to 
30 check whether the spoken sentence has been spoken by one of the speakers covered 
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by the speaker identification system. In the process, there are^ in principle,, two 
types of speaker identification systems?; text-dependent systems and text- 
independent systems. For the procedure described in the present invention, text 
independence of the system is achieved by means of via an expanded training 
5 phase? in which the speaker has to record a wide range of material, and the 

probability distributions of said the speech signal parameters are established from 
all the spoken material. A text-dependent system can be trained more easily since 
the spoken material which is spoken by the speaker during the usage phase is 
limited to a number of keywords or specific sentences. The preparation phase is 

1 0 continued until the system reliably identifies the voice of the speaker. An 

exemplary embodiment of all phases of the method of the present invention is 
summarily depicted and described in Figures 8a-8m. to which the entirety of the 
following disclosure is associated. 

The object of speaker identification is illustrated in Figure 1 (Problem of 

1 5 speaker identification) . 

Speaker identification is dealt with as a problem relating to the detection of 
multiples [2], The classes to be distinguished between, one for each speaker who is 
intended to be identified by the system, are referred to as spi = 1..M, where Mis the 
• number of speakers covered by the speaker identification system. Speaker 

20 identification is based on recorded signals spoken by the respective speaker. The 
speech signal is segmented into the signal frames x = [x(l)..x(K)] (K= 160 for a 
signal frame with a length of 20 ms and a sampling frequency of 8 kHz, for 
example). The segmentation process produces the speech signal frames x(l)..x(A?) ? 
where N depends on the total length of the sentence or keyword spoken by the 

25 speaker. The decision on the speaker is made from the probabilities or probability 
densities (referred to jointly as probability scores) that the vectors of the samples 
x(/) / = 1..N belong to the class spf. The statistically optimum decision scheme 
selects that class spi having the highest probability value for given x(/), l=\..N. 
This moans that As such, the vector x(/) is assigned to the class spy for which: 
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p(x(l)...x(W) | spj) > p(x(l).„x(/V) | spt) for all j * / 



Speaker verification 

The problem of speaker verification is to check the predetermined identity 
5 of the speaker on the basis of his /her voice. This corresponds to the situation 
illustrated in Figure 2 (Problem of speaker verification). 

The process of speaker verification is carried out in a similar manner to that 
of speaker identification, that is to say wherein the spoken sentence is likewise 
segmented. However, after this, the voice is not classified, but a probability score is 
10 calculated for the predetermined speaker identity? and is compared with a threshold. 
The identity of the speaker is thus confirmed on the basis of his /her voice when: 

P(X0M(/V) | spj) > threshold 

where spj corresponds to the predetermined speaker identity. The threshold must be 
set sufficiently high to avoid the situation in which a speaker with a different 
1 5 identity to that predetermined is accepted/authorized. 
LPAS coder 

The speech coding methods used nowadays are predominantly based on the 
analysis by synthesis method using an LPC synthesis filter [2]. In these methods, 
speech coding is optimized by repetition of the coding and decoding operations 

20 until the optimum parameter set is found for the given speech section. 

One of the most widely used types of LPAS coder is the CELP coder. One 
relatively new development is the harmonic vector excited codec, where the form 
of the excitation signals is particularly suitable for the described task. Figure 3 
(Design of an LPAS copier Coder) illustrates the synthesis model of a CELP coder. 

25 The synthesis model defines the method for calculating the synthesized speech 
signal from the quantized parameters of the speech signal. In general, each LPAS 
coder has the following parameter groups (see also Figure 4) : 
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• Short-term predictor parameters. The short-term predictor parameters are 
generally calculated by means of via classical LPC analysis, using the 
correlation method or the covariance method for linear prediction [3], 8-10 LPC 
coefficients are used for signal frames with a length of 20 to 30 ms and a 

5 sampling rate of 8 kHz. The short-term predictor parameters may occur in 
various forms (for example the reflection coefficients or in the form of line 
spectrum frequencies LSF), depending on the representation which can be 
quantized better. It has been found that the LSF coefficients are most suitable for 
quantization, and this form of prediction coefficients is generally used. The 
1 0 short-term predictor parameters are calculated using an open-loop procedure, 
that is to say without the overall optimization, illustrated in Figure 1, with the 
other parameters relating to the synthesis error. 

• Long-term predictor parameters. Long-term predictor parameters are used in a 
filter which synthesizes the fundamental frequency of the speech signal. This is 

15 generally a long-term predictor with a filter coefficient and a parameter for the 
fundamental period of the voice signal. A long-term predictor with the 
parameters b = [b,N] is a part of Figure 2. The long-term predictor parameters 
are likewise calculated using an open-loop procedure, without overall 
optimization with the other parameters. In some coders, a refined search is 

20 sometimes carried out based on the long-term predictor parameters using a 
closed-loop procedure. 

• The excitation parameters. The 5-10 ms subframes of the remaining signal are 
vector-quantized using a closed-loop procedure in a CELP coder. The 
transmitted parameters allow the signal forms to be reproduced at the decoder 

25 end from the stored codebook. 

In an HVXC codec, the output from the LPC analysis filter is transformed 
to the frequency domain and the fundamental-period-normalized spectral envelope 
is vector-quantized. 

Speaker identification using the parameters of an LPAS coder 
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The speech coder parameters provide a comprehensive description of the 
possible speech signals using considerably fewer parameters than when the speech 
signal is represented as a sequence of samples. 

The decomposition of the speech signal into the said parameter groups can 
5 be used in various ways for speaker identification. The methods for calculation of 
the parameters and synthesis of the speech signal imply probability density 
estimation methods (for example the probabilities of the parameters, which are 
regarded as discrete probability variables). Those defined using a closed-loop 
procedure should actually be regarded as discrete probability variables, since it is 
10 impossible to link the volumes of the parameter space regions of the vector 

quantizer for parameters such as these. This relates in particular to the excitation 
parameters. The probability distributions for such parameters are estimated by 
calculating relative frequencies of the parameters/code vectors in the training 
statement. 

1 5 Those which are calculated using an open-loop procedure in the coder are 

initially available in a non-quantized form and are quantized only after this, with 
vector quantization generally being used. For parameters such as these, the 
probability densities can be estimated from the training statement. This approach is 
used primarily for the short-term predictor parameters. 

20 The probability density estimation is based on the histogram method [6]. 

This method requires knowledge of the volumes of those regions of the parameter 
space which are linked to the quantized points. 

A method for storage of probability distributions is obtained, according to 
Figure 5 (Speaker identification using the parameters of an LPAS coder), if the 

25 possible code vectors for the speech signal parameters are stored once for the entire 
population, which corresponds to the situation where the quantization steps/code 
vectors are determined once, from the database which contains the recordings by a 
large number of speakers. The probability distributions of the parameters for the 
speakers are then stored together with the indices of the code vectors for the 

30 parameters in the system. This is suitable for large systems with a very large 
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number of users (ATM, access systems in companies). In this regard, see all Figure 
7. 

Another method is for the code vectors for the parameters for each speaker 
to be trained individually. The code vectors are then stored together with the values 
5 of the probability densities at those parameter space points defined by the code 
vectors. One possible way of carrying out this method is shown in Figure 6 
(speaker identification using the parameters of an LPAS coder, probability densities 
are stored together with the code vectors for the parameters). This method is 
intended for a small number of speakers (for example a for a voice-controlled door 
10 . in a dwelling). 

Training phase of a speaker identification system 

The probability density distributions for the speaker classes are estimated 
from the training material. For text-dependent speaker identification (speaker 
identification/speaker verification), a specific sentence or keyword is repeated 
1 5 during the training phase until the speaker identification operates reliably. 

Phonetically balanced spoken material must be recorded for text- 
independent speaker verification. In this case as well, the training phase must be 
repeated until the speaker identification/verification operates reliably. 

The material recorded during the training phase is in each case used with a 
20 phase shift a number of times for training, in order to make the speaker 

identification system independent of the initial phase of the recorded voices. The 
data used for training are referred to as the training statement TS S p^ with spj 

symbolizing the speaker. 

Estimation of the probability densities 

25 In order to describe the method according to the present invention for 

estimation of the probability densities of the parameters for the speaker classes, a 
number of necessary definitions first will be introduced first of all . 
The introduced abstraction of the coding process has the advantage that the 
estimation of the probability densities can be described in a simple manner without 

30 needing to go into the details of the highly complicated operations in the speech 
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coder. A detailed description of the parameter calculation process can be found in 
[4] and [5]. 

A speech coder operates in evaluation intervals. The operations described in 
that section via the LPAS coder are carried out in the speech coder for each signal 
5 frame, and supply the parameters of the speech signal for the respective frame. 

Calculation of a non-quantized parameter vector p from the signal frame x is 
written as p = Kp(x) in an open-loop optimization procedure. The quantization of 
the parameter is referred to as p-Q p (pj. That region in the parameter space of the 

parameter p which is mapped onto the code vector p in the coding process is 
10 referred to as S p ={p : Q p (p)=p} . The volume of this region is referred to as 
V(S p ). 

The set of possible code vectors for the parameter p is written as 
C p ={p ; - ; i=\..N p , where N p is the number of code vectors. The set or regions 
which are linked to the code vectors is referred to as R p ={s i ; i= 1 . .N }. The 
15 association function for a region Sj is referred to as: 

The frequency of occurrence of a parameter in the training statement is 
calculated using 

20 
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Number of parameter values from the training statement TS sp . which occur in the region S; 
Number of parameter values from the training statement TSspj 

The estimated probability density distribution then becomes: 



Np f s 
p(p|s P/ )-X 1 ^(p>^) 



Estimation of the probabilities 
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The probability functions (probability mass functions) are estimated for 
those parameters which are regarded as discrete probability variables, that is to say 
in particular the excitation from the codebook, which is optimized using a closed- 
loop procedure, and the fundamental period of the speech signal. These probability 
5 functions are defined as the frequencies of the given parameter codes in the training 
statement for the respective speaker. 
Storage of the probability distributions 

The speech parameters are not all calculated at the same time, but 
successively, in a speech coder. For example, the short-term predictor parameters 
10 are calculated first, and the remaining parameters for already known short-term 
predictor parameters are then optimized with regard to the synthesis or the 
prediction error. This allows effective storage of the probability distributions as 
conditional probabilities of the code vectors in a tree structure. This is possible 
thanks due to the following relationship: 



P(Pk>Pl>Pa I»P/) = P(Pk \ s PMPl I*PmPk)P(P* l*P/.PioPi) 



A major simplification can be achieved if the speech parameters within a 
signal frame can be assumed to be statistically independent. The above formula 
then becomes: 
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Vector for a short-term parameter 
Vector for a long-term parameter 
Vector for an excitation parameter 
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k.Pl-P a\ SP/) = P(Pk I SPMPl I WMPa I SP) 
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The probability densities need to be stored at a very large number of points 
in parameter space in the system. The number of bits used for storing probability 
densities is critical to the complexity of the overall system. A vector quantizer is 
therefore used for the probability values. This makes it possible to reduce the 
5 number of bits used for storing the probability distributions. 
System safety and security 

In order to prevent the system from being outwitted, noise is transmitted at 
the same time that the voice of the speaker is being recorded, which noise is known 
to the system, and from which the digitized speech signal is subtracted. 

10 

The present invention can be used for access control applications, such as 
voice-controlled doors, or for verification-; for example^ for bank access systems. 
The procedure can be implemented as a program module on a processor which 
carries out the task of speaker identification in the system. 

15 An exemplary embodiment of the invention is described with reference to 

Figures 7 and 8a to 8m. 

Although the present invention has been described with reference to specific 
embodiments, those of skill in the art will recognize that changes may be made 
thereto without departing from the spirit and scope of the invention as set forth in 

20 the hereafter appended claims. 
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Abstract 

ABSTRACT OF THE DISCLOSURE 

Method for identification of speakers on the basis of their voices 

The invention relates to a A method for speaker identification using 
parameters of an LPAS coder or of a parametric coder for modeling the probability 
distribution for the speaker classes. 
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Description 

Method f or identification of sp eaker s^ on_the_ basi £_ of 
their voices 



The invention relates to a method for identification of 
speakers on the basis of their voices. 

The object on which the invention is based is to 
specify a method for identification of speakers on the 
basis of their voices, which method is robust, safe, 
secure and reliable. 

According to the invention, this object is achieved by 
the features specified in patent claim 1. 

The invention will be described in more detail in the 
following text using a flowchart. 

1. 

The invention allows the identification of the speaker 
on the basis of his voice. The problem of speaker 
identification is to distinguish between different 
speakers or to check the predetermined speaker 
identity, with the only input information being the 
recording of the voice of the speaker. 

Furthermore, a method is proposed which prevents the 
access system from being outwitted when the voice and 
the keyword are recorded by third parties. 

When complex probability distributions for the speech 
parameters of a speaker are stored, a compromise must 
be made between accuracy and memory requirement. For 
this reason, methods for storage of the probability 
distributions have been proposed which can be used as a 
function of the number of speakers. 
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2 . 

Until now, the speaker has been identified, for 
example, with the aid of hidden Markov models or by 
vector quantization, see reference [1] . 

5 

3. 

The invention solves the problem of speaker 
identification based on the parameters of an analysis 
by means of synthesis coders using linear prediction 

10 (LPAS) [1] (for example a harmonic vector excited codec 
[5] or waveform interpolation codec [4]. The speech 
signal parameters used in the past, such as Cepstral AR 
parameters, do not provide a satisfactory solution to 
the problem. For this reason, other parameters need to 

15 be used, such as parameters relating to the excitation 
of the vocal tract , which include information that is 
dependent on the speaker and is at the same time 
largely phoneme-independent . 



20 Furthermore, the method for estimation of the 
probability distribution of the coder parameters for 
the respective speaker is given, as well as a method 
which prevents the access system from being outwitted. 



2 5 Speaker identification 



In systems for speaker identification, statistical 
principles [2] are used to check whether the spoken 
sentence has been spoken by one of the speakers covered 

30 by the speaker identification system. In the process, 
there are in principle two types of speaker 
identification systems, text-dependent systems and 
text-independent systems . For the procedure described 
in the invention, text independence of the system is 

35 achieved by means of an expanded training phase, in 
which the speaker has to record a wide range of 
material, and the probability distributions of said 
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speech signal parameters are established from all the 
spoken material. A text-dependent system can be trained 
more easily since the spoken material which is spoken 
by the speaker during the usage phase is limited to a 
number of keywords or specific 
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sentences. The preparation phase is continued until the 
system reliably identifies the voice of the speaker. 

The object of speaker Identification is illustrated in 
5 Figure 1 (Problem of speaker identification) . 

Speaker identification is dealt with as a problem 
relating to the detection of multiples [2] . The classes 
to be distinguished between, one for each speaker who 

10 is intended to be identified by the system, are 
referred to as sp± = 1..M, where M is the number of 
speakers covered by the speaker identification system. 
Speaker identification is based on recorded signals 
spoken by the respective speaker. The speech signal is 

15 segmented into the signal frames x = [x(l) ..x(K)] (K = 
160 for a signal frame with a length of 20 ms and a 
sampling frequency of 8 kHz, for example) . The 
segmentation process produces the speech signal frames 
x(l)..x(Af), where N depends on the total length of the 

20 sentence or keyword spoken by the speaker. The decision 
on the speaker is made from the probabilities or 
probability densities (referred to jointly as 
probability scores) that the vectors of the samples 
x(l) I = 1..N belong to the class sp±. The 

25 statistically optimum decision scheme selects that 
class spi having the highest probability value for 
given x(_Z), 1 = 1..IV. This means that the vector x(I) 
is assigned to the class spj for which: 

p(x(l)...x(/V) | spj) > p(x(l)...x(A/) | sp ( ) for all j *i 

30 

Speaker verification 

The problem of speaker verification is to check the 
predetermined identity of the speaker on the basis of 
35 his voice. This corresponds to the situation 
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illustrated in Figure 2 (Problem of speaker 
verification) . 

The process of speaker verification is carried out in a 
similar manner to that of speaker identification, that 
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is to say the spoken sentence is likewise segmented. 
However, after this, the voice is not classified, but a 
probability score is calculated for the predetermined 
speaker identity, and is compared with a threshold. The 
5 identity of the speaker is thus confirmed on the basis 
of his voice when: 



p(x(l)..x(/V) | spj) > threshold 



where spj corresponds to the predetermined speaker 
10 identity. The threshold must be set sufficiently high 
to avoid the situation in which a speaker with a 
different identity to that predetermined is 
accepted/ author! zed . 

15 LPAS coder 

The speech coding methods used nowadays are 
predominantly based on the analysis by synthesis method 
using an LPC synthesis filter [2]. In these methods, 
speech coding is optimized by repetition of the coding 

2 0 and decoding operations until the optimum parameter set 
is found for the given speech section. 

One of the most widely used types of LPAS coder is the 
CELP coder. One relatively new development is the 

25 harmonic vector excited codec, where the form of the 
excitation signals is particularly suitable for the 
described task. Figure 3 (Design of an LPAS copier) 
illustrates the synthesis model of a CELP coder. The 
synthesis model defines the method for calculating the 

30 synthesized speech signal from the guantized parameters 
of the speech signal . In general , each LPAS coder has 
the following parameter groups : 

• Short-term predictor parameters. The short-term 
35 predictor parameters are generally calculated by 
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means of classical LPC analysis, using the 
correlation method or the covariance method for 
linear prediction [3] . 8-10 LPC coefficients are used 
for signal frames with a length of 20 to 30 ms and a 
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sampling rate of 8 kHz. The short-term predictor 
parameters may occur in various forms (for example 
the reflection coefficients or in the form of line 
spectrum frequencies LSF) , depending on the 
representation which can be quantized better. It has 
been found that the LSF coefficients are most 
suitable for quantization, and this form of 
prediction coefficients is generally used. The short- 
term predictor parameters are calculated using an 
open-loop procedure, that is to say without the 
overall optimization, illustrated in Figure 1, with 
the other parameters relating to the synthesis error. 



Long-term predictor parameters. Long-term predictor 
parameters are used in a filter which synthesizes the 
fundamental frequency of the speech signal. This is 
generally a long-term predictor with a filter 
coefficient and a parameter for the fundamental 
period of the voice signal. A long-term predictor 
with the parameters b = [b,N] is a part of Figure 2. 
The long-term predictor parameters are likewise 
calculated using an open-loop procedure, without 
overall optimization with the other parameters. In 
some coders, a refined search is sometimes carried 
out based on the long-term predictor parameters using 
a closed-loop procedure. 



The excitation parameters. The 5-10 ms subframes of 
the remaining signal are vector-quantized using a 
closed-loop procedure in a CELP coder. The 
transmitted parameters allow the signal forms to be 
reproduced at the decoder end from the stored 
codebook . 



n an HVXC codec, the output from the LPC analysis 
liter is transformed to the frequency domain and the 
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fundamental-period-normalized spectral envelope is 
vector-quantized . 
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Speaker identification using the parameters of an LPAS 
coder 

The speech coder parameters provide a comprehensive 
description of the possible speech signals using 
5 considerably fewer parameters than when the speech 
signal is represented as a sequence of samples. 

The decomposition of the speech signal into the said 
parameter groups can be used in various ways for 

10 speaker identification. The methods for calculation of 
the parameters and synthesis of the speech signal imply 
probability density estimation methods (for example the 
probabilities of the parameters, which are regarded as 
discrete probability variables) . Those defined using a 

15 closed-loop procedure should actually be regarded as 
discrete probability variables, since it is impossible 
to link the volumes of the parameter space regions of 
the vector quantizer for parameters such as these. This 
relates in particular to the excitation parameters. The 

20 probability distributions for such parameters are 
estimated by calculating relative frequencies of the 
parameters/code vectors in the training statement. 

Those which are calculated using an open-loop procedure 
25 in the coder are initially available in a non-quantized 
form and are quantized only after this, with vector 
quantization generally being used. For parameters such 
as these, the probability densities can be estimated 
from the training statement. This approach is used 
30 primarily for the short-term predictor parameters. 

The probability density estimation is based on the 
histogram method [6]. This method requires knowledge of 
the volumes of those regions of the parameter space 
35 which are linked to the quantized points. 
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A method for storage of probability distributions is 
obtained, according to Figure 5 (Speaker identification 
using 
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the parameters of an LPAS coder) , if the possible code 
vectors for the speech signal parameters are stored 
once for the entire population, which corresponds to 
the situation where the quantization steps/code vectors 
5 are determined once, from the database which contains 
the recordings by a large number of speakers. The 
probability distributions of the parameters for the 
speakers are then stored together with the indices of 
the code vectors for the parameters in the system. This 
10 is suitable for large systems with a very large number 
of users (ATM, access systems in companies) . 

Another method is for the code vectors for the 
parameters for each speaker to be trained individually. 

15 The code vectors are then stored together with the 
values of the probability densities at those parameter 
space points defined by the code vectors. One possible 
way of carrying out this method is shown in Figure 6 
(speaker identification using the parameters of an LPAS 

20 coder, probability densities are stored together with 
the code vectors for the parameters) . This method is 
intended for a small number of speakers (for example 
for a voice-controlled door in a dwelling) . 

2 5 Training phase of a speaker identification system 

The probability density distributions for the speaker 
classes are estimated from the training material. For 
text-dependent speaker identification (speaker 

identification/speaker verification) , a specific 

30 sentence or keyword is repeated during the training 
phase until the speaker identification operates 
reliably . 

Phonetically balanced spoken material must be recorded 
35 for text-independent speaker verification. In this case 
as well, the training phase must be repeated until the 
speaker identification/verification operates reliably. 
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The material recorded during the training phase is in 
each case used with a phase shift a number of times for 
training, in order to make the speaker identification 
system independent of the initial phase of the recorded 
5 voices. The data used for training are referred to as 
the training statement TS sp± , with sp± symbolizing the 
speaker . 

Estimation of the probability densities 

10 In order to describe the method according to the 
invention for estimation of the probability densities 
of the parameters for the speaker classes, a number of 
necessary definitions will be introduced first of all. 
The introduced abstraction of the coding process has 

15 the advantage that the estimation of the probability 
densities can be described in a simple manner without 
needing to go into details of the highly complicated 
operations in the speech coder. A detailed description 
of the parameter calculation process can be found in 

20 [4] and [5] . 

A speech coder operates in evaluation intervals. The 
operations described in that section via the LPAS coder 
are carried out in the speech coder for each signal 
25 frame, and supply the parameters of the speech signal 
for the respective frame. 

Calculation of a non-quantized parameter vector p from 
the signal frame x is written as p = K p (k) in an open- 
30 loop optimization procedure. The quantization of the 
parameter is referred to as p=Q p (p . That region in the 

parameter space of the parameter p which is mapped onto 

the code vector p in the coding process is referred to 

as S p ={p : Q p (p)=p} - The volume of this region is 
35 referred to as V{3 p . 

The set of possible code vectors for the parameter p is 
written as C p ={p 1 ; i=l . -N p , where N p is the number of 
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code vectors . The set or regions which are linked to 
the code vectors is referred to as R p ={s i ; i= 1 . .N } . The 

association function for a region Si is referred to as: 
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10 



1 t n \ JlforpeS, 



The frequency of occurrence of a parameter in the 
training statement is calculated using 

Number of parameter values from the training statement TS sp . which occur in the region Sj 

t = — 

Number of parameter values from the training statement TSspj 

The estimated probability density distribution then 
becomes : 



Estimation of the probabilities 

The probability functions (probability mass functions) 
15 are estimated for those parameters which are regarded 
as discrete probability variables , that is to say in 
particular the excitation from the codebook, which is 
optimized using a closed-loop procedure, and the 
fundamental period of the speech signal. These 
20 probability functions are defined as the frequencies of 
the given parameter codes in the training statement for 
the respective speaker. 

Storage of the probability distributions 

25 The speech parameters are not all calculated at the 
same time, but successively, in a speech coder. For 
example, the short-term predictor parameters are 
calculated first, and the remaining parameters for 
already known short-term predictor parameters are then 

30 optimized with regard to the synthesis or the 
prediction error. This allows effective storage of the 
probability distributions as conditional probabilities 
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of the code vectors in a tree structure. This is 
possible thanks to the following relationship: 
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P(Pk.Pl-P* |sp,) = p(Pk \spMPl t $P/.Pk)P(P/i |sP/-P*.Pi) 



P K - Vector for a short-term parameter 
P L - Vector for a long-term parameter 
P A - Vector for an excitation parameter 



A major simplification can 
parameters within a signal 
statistically independent . 
becomes : 



be achieved if the speech 
frame can be assumed to be 
The above formula then 



P(Pk,P l .P* i sp,) = P(Px I spMPl I SP,)P(P* I sp) 

The probability densities need to be stored at a very 
large number of points in parameter space in the 
system. The number of bits used for storing probability 
densities is critical to the complexity of the overall 
system. A vector quantizer is therefore used for the 
probability values. This makes it possible to reduce 
the number of bits used for storing the probability 
distributions . 

System safety and security 

In order to prevent the system from being outwitted, 
noise is transmitted at the same time that the voice of 
the speaker is being recorded, which noise is known to 
the system, and from which the digitized speech signal 
is subtracted. 

5. 

The invention can be used for access control 
applications, such as voice-controlled doors, or for 
verification, for example for bank access systems. The 
procedure can be implemented as a program module on a 
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processor which carries out the task of speaker 
identification in the system. 

An exemplary embodiment of the invention is described 
5 with reference to Figures 7 and 8a to 8m. 



AMENDED SHEET 



- 11 - 



10 



15 



[1] S. Furui, "Recent advances in speaker 

recognition", Pattern Recognition Letters, Tokyo 

Inst, of Technol., 1997 
[2] P. Vary, U. Heute, W. Hess, Digitale 

Sprachsignalverarbeitung [Digital speech signal 

processing] , B.G. Teubner, Stuttgart, 1998 
[3] K. Kroschel, Statistische Nachrichtentheorie 

[Statistical information theory], 3rd ed., 

Springer-Verlag, 19 97 
[4] W.B. Kleijn, K.K. Paliwal, 

Synthesis , Elsevier, 1995 
[5] ISO/IEC 14496-3, MPGA-3 

description 
[6] Prakasa Rao, Functional 

Press, 1982 



Speech Coding and 
HVXC Speech Coder 
Estimation, Academic 



AMENDED SHEET 



1999P02665 



- 12 - 



Patent Claims 



A method for identification of speakers on the 
basis of their voices, having the following 
features : 

(a) in a preparation phase, 

(al) k text-dependent or text-independent 
reference spoken expressions, which form a 
speaker-related training statement, from M 
speakers are segmented into first speech signal 
frames of length L, 

(a2) the first speech signal frames are supplied 
to an analysis-by-synthesis coder based on linear 
prediction, 

(a3) a first short-term predictor parameter, long- 
term predictor parameter and/or excitation 
parameter for the coder are/is calculated in the 
analysis-by-synthesis coder for each of the M 
speakers and for each first speech signal frame in 
each case, with the parameters then forming 
speaker-related training material, 

(a4) the frequency of the respective occurrence of 
the first short-term predictor parameter, of the 
long-term predictor parameter and/or of the 
excitation parameter for the coder in the speaker- 
related training statement and/or the probability 
densities with which the first short-term 
predictor parameter, the long-term predictor 
parameter and/or the excitation parameter are/is 
contained in the speaker-related training 
statement are/is calculated in the analysis-by- 
synthesis coder for each of the M speakers and for 
each first speech signal frame in each case, 
(a5) the calculated frequencies and/or probability 
densities are stored on a speaker-related basis as 
speaker data, 
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(b) in a simulated usage phase of the training 
phase, 

(bl) a text-dependent or text-independent 
simulation spoken expression of an m-th speaker 
5 where m=l..M is segmented into second speech 

signal frames of length L, 

(b2) the second speech signal frames are supplied 
to the analysis-by-synthesis coder, 

(b3) a second short-term predictor parameter, 
10 long-term predictor parameter and/or excitation 

parameter for the coder are/is calculated 
in the analysis-by-synthesis coder for the m-th 
speaker and for every other speech signal frame in 
each case, 
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(b4) first probability hits are calculated for 
every other speech signal frame from the 
calculated second short-term predictor parameter, 
long-term predictor parameter and/or excitation 
parameter and the speaker data stored for the m-th 
speaker in the preparation phase, which 
probability hits indicate the probability with 
which the second short-term predictor parameter, 
long-term predictor parameter and/or excitation 
parameter match (es) the first short-term predictor 
parameter, long-term predictor parameter and/or 
excitation parameter, 

(b5) the first probability scores from all the 
second speech signal frames are combined, 

(b6) a check is carried out to determine whether 
the combined first probability scores are greater 
than a predetermined first threshold which 
confirms the voice of the m-th speaker, when the 
combined first probability scores are greater than 
the predetermined first threshold or the 
preparation phase continues for a further i 
reference spoken expressions by the m-th speaker 
until the voice of the m-th speaker is confirmed, 
when the combined first probability scores are 
less than or egual to, or are less than, the 
predetermined first threshold, 

(c) in a usage phase 

(cl) a text-dependent or text-independent used 
spoken expression of the m-th speaker where m=l. .M 
is segmented into third speech signal frames of 
length L, 

(c2) the third speech signal frames are supplied 
to the analysis-by-synthesis coder, 

(c3) a third short-term predictor parameter, long- 
term predictor parameter and/or excitation 
parameter for the coder are/is calculated in the 
analysis-by-synthesis coder for the m-th speaker 
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and for every third speech signal frame in each 
case, 

(c4) second probability hits are calculated for 
every third speech signal frame from the 
calculated third short-term predictor parameter, 
long-term predictor parameter and/or excitation 
parameter and the speaker data stored for the 
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m-th speaker in the preparation phase, which 
second probability hits indicate the probability 
with which the third short-term predictor 
parameter, long-term predictor parameter and/or 
5 excitation parameter have been spoken by the m-th 

speaker, 

(c5) the second probability hits from all the 
third speech signal frames are combined, 
(c6) a check is carried out to determine whether 

10 the combined second probability scores are greater 

than a predetermined second threshold and the 
voice of the m-th speaker is identified when the 
combined second probability hits are greater than 
the predetermined second threshold, or the voice 

15 of the m-th speaker is not identified when the 

combined second probability scores are less than 
or equal to, or are less than, the predetermined 
second threshold. 

20 2. The method as claimed in claim 1, characterized in 
that 

a harmonic vector excited predictive coder or a 
waveform interpolating coder is used, in 
particular, as a parametric coder. 

25 

3. The method as claimed in claim 1, characterized in 
that 

a coder based on linear prediction, in particular 
an LPAS coder, is used as the analysis-by~ 
30 synthesis coder. 

4. The method as claimed in one of claims 1 to 3, 
characterized in that 

the frequencies and/or probability densities are 
35 quantized using a vector quantizer having a 

specific, considerably reduced, number of bits. 
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5. The method as claimed in one of claims 1 to 4 , 
characterized in that 
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noise which is known to the speaker identification 
system is also entered when the spoken expression 
of the speaker is entered into the speaker 
identification system. 

5 

6. The method as claimed in one of claims 1 to 5, 
characterized in that 

the noise which is also entered is subtracted 
internally, before the segmentation, from the 
10 recording of the speaker voice. 
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Abstract 

Method for identification of speakers on the basis of 
their voices 

The invention relates to a method for speaker 
identification using parameters of an LPAS coder or of 
a parametric coder for modeling the probability 
distribution for the speaker classes. 
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FIG 1 
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The voice originates from which speaker? 
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FIG 2 
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FIG 4 
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FIG 5 



Speaker's voice 



Code vectors for 
the parameters 
of the LPAS coder 



1 



Coding operation 
open loop, closed loop 
parameter calculation 



Probability distributions 
of the coded parameters 
for speaker 1 



Coded parameters 



Decision 
on the 
speaker 



Probability distributions 
of the coded parameters 
for speaker M 



Speaker identity 
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FIG 6 
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Speaker's voice 



Calculation of the non-quantized 
parameters using an open-loop 
procedure 



Parameter quantization 
and calculation of the 
probability scores 
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Probability distributions 
and the code vectors of 
the parameters for speaker 1 



Parameter quantization 
and calculation of the 
probability scores 
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Probability distributions and 
the code vectors of the para- 
meters for speaker M 



Decision 



Speaker Identity 
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FIG 7 



Speaker verification using the parameters of an LPAS coder 



Recorded speaker voice - a 
specific keyword or sentence 
for text-dependent speaker 
verification, any desired text for 
text-independent speaker verification 



Segmentation of the speech 
signal into signal frames of 
length 20-30 ms 



± 



Calculation of the non-quantized 
speech parameters. The short- 
term predictor parameters, the 
long-term predictor parameters 
and the long-term remaining 
signal are calculated 






Predetermined speaker identity 
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Calculation, for each frame, of 
the probability scores 
(probabilities or probability 
densities) 
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Speaker data 
Probability distribution of 
speech parameters 
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Combination of probability 
scores from all signal frames. 
It is assumed that the signal 
frames for the speech signal are 
statistically independent 
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Decision as to whether the 
predetermined identity of the 
speaker and the voice of the 
speaker match 
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FIG 8 a 



Speaker identification system preparation phase* 

(Profile for speaker j) 



Training of a 
text-independent system 



Recording of widely phonetically 
balanced material from the j-th, j=1..M 
system user. A relatively large number 
1..K of reference spoken expressions. 



Training of a 
text-dependent system 



Specific word sequence, a sentence or 
a keyword. Corresponding number 
1..K of reference spoken expressions 
for the j-th, j=1..M system user. 



A- 



Segmentation of the training material into signal 
frames x(1)...x(N) where N is dependent on the 
total length of the spoken expressions. x(i)=[x 
(1)...x(L)] where L is the length of the signal frame. 



Large number 
of speakers >10 
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Speech database several 
hours of recordings by 
different speakers 



Training of the speaker-independent 
codebooks for the short-term parameters with 
the aid of the K-mean algorithm Cbk = 
[C k , eR p , i=l „|_k ], L k is the number of 
codebook entries and p = 8..10 length of the 
LSF code vector. 



Training of the speaker-independent codebooks 
for the excitation parameters. Codebooks for the 
fundamental -period -normalized spectral forms 
of the LPC remaining signal Cb K = 
[C A| eR p , rL.La], (La is the number of code 
vectors and p = 44 length of the code vector). 
Parameters in the same form as in the HVXC 
codec. 



^0 



* The process defined from hereon is carried out for each new user of the speaker identification 
system. The aim of the preparation phase is to produce the speaker data for each of the M speakers. 
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PIS 8b 



Small number 
of speakers <10 



© 



The codebooks are trained 
for each of the M speakers 
using the material recorded 
by the respective speaker 



Training of the speaker-independent codebooks 
for the short-term parameters with the aid of 
the K-mean algorithm Cb k = [C ki eR p , i=l..N k ], 
N k is the number of codebook entries and 
p = 8..10 length of the LSF code vector. 



Training of the speaker-independent codebooks 
for the excitation parameters. Codebooks for the 
fundamental-period -normalized spectral forms 
of the LPC remaining signal Cbk = [C Aj eR p , 
i=1 ..N A ], (N A is the number of code vectors and 
p = 44 length of the harmonic code vector). 
Parameters in the same form as in the HVXC 
codec. 




(Z) 



: Calculation of the speech parameters for 
: the training sets for each speaker based 
L on the layout o f an H VXC codec* 




Calculation of the short-term 
parameters for each signal 
frame p k (i) 9 i - 1..N 
Training set for the short- 
term parameters is formed 
for the respective speaker: 
TSKj -{p k (i),i-1..N}j=1..M 



j Calculation of the long-term 
] parameters for each signal 
| frame p L [i), i = 1..N 
i Training set for the long-term 
I parameters is formed for the 
' respective speaker: TSLj = 
|{p L (i),i = 1..N} 



Calculation of the excitation 
parameters for each signal 
frame p A (i), i = 1 ..N 
S peech fundamental-period - 
normalized spectral forms of 
the LPC remaining signal 
Training set for the short-term | 
parameters is formed for the 
respective speaker: TSAj = 
{p A (i),i = 1..N} 



0 



* ISO/I EC 14496-3 Information Technology - Very low bit rate audio-visual coding 
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FIG 8c 



Calculation of the volumes of the Voronoi 
cell regions for the probability density estimate 
of the short-term predictor parameters 



© © 



© 



Calculation of the volumes of the vector 
quantizer cells 

Ski ={XeR p : |C Kl .x|<jC KJ -x|,i#j} 




V (Ski ! 



Calculation of the frequencies of the short-term parameters 

Number of code vectors C contained in the TSKj 
fc ki~ Number of all code vectors in the TSKj 



Calculation of the probability densities of the short-term predictor parameters: 

a»» Sc. 



p{p K \ Speaker, )= Xl^Cp*)—^ 



1 ct> * 1 Number of code vectors in the codebook CbK 

-Association function for the region Ski 



iWKj | 0 forp«S Ki 









4 


Store the probabilities of the short-term predictor 
parameters for a large number of speakers 


Store the probabilities of the short-term predictor 
parameters for a small number of speakers 


Code vector 
index 1 


Probability density value 1 






Code vector 1 


Probability density value 1 














Code vector 
index j 


Probability density value 1 






Code vector 1 


Probability density value I 


J - Number of Voronoi cells whose probability 
is not equal to zero. 


I - Number of code vectors in the codebook of the 
short-term predictor parameters for speaker j. 
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FIG 8d 



0 



Calculation of the frequencies of the 
long-term predictor parameters for the 
speaker j in the training set TSLj 



Storage of the probability distributions of the 
long-term predictor parameters. These 
probability distributions are stored in the 
same way, irrespective of the number of 
speakers 


Speech fundamental 
period value 1 


Frequency 1 








Speech fundamental 
period value D 


Frequency D 



J. 



Storage of the probabilities of the excitation 
parameters for a large number of speakers 



Code vector 
index 1 


Probability value 1 






Code vector 


Probability value D 


index D 



D - Number of excitation code vectors whose 
probability is not equal to zero. 



Calculation of the frequencies of the 
excitation parameters for speaker j in 
the training set TSAj 



Storage of the probabilities of the excitation 
parameters for a small number of speakers 


Code.vector 1 . . 


Probability value 1 






Code vector L A 


Probability value L A 



L A - Number of code vectors in the codebook for 
the short-term predictor parameters for speaker j 
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FIG 8e 



Simulated usage phase 
Training of the system for speaker i 



Request to record the K+1 
test spoken expression 



Simulated usage phase for a text- 
independent system 



Recording of any desired K+1-th 
spoken expression by the j-th system 
user 



Simulated usage phase for a text- 
independent system 



Specific word sequence, a sentence or 
keyword. K+1-th spoken expression 
by the j-th system user 



V 



On© 



Segmentation of the test spoken expression into 
the signal frames x(1)...x(N) where N is dependent 
on the total length of the test expression. 
x(i)=[x(1J...x(LJ] where L is the length of the 
signal frame. 




Calculation of the short-term 
parameters for each signal 
frame p k (i), i = 1..N 
Training set for the short- 
term parameters is formed 
for the respective speaker: 
={P k 0),i=1..N}j=1..M 



Calculation of the long-term 
parameters for each signal 
frame pji), i = 1..N 
Training set for the long-term 
parameters is formed for the 
respective speaker: TSLj = 
{p L (i),i = 1..N} 




Calculation of the excitation 
parameters for each signal 
frame p.(i), i = 1..N 
Speech uindamental-period- 
normalized spectral forms of 
the LPC remaining signal 
Training set for the short-term 
parameters is formed for the 
respective speaker: TSAj = 
{p A (i),i = 1..N} 
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FIG 8f 
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© 




© 



Calculation for the short-term 
parameters p^i), i = 1..N in 
each frame of the probability 
P(Pk(>)| speaker j) 



© 



Calculation for the short-term 
parameters p K (i), i - 1..N in each 
frame of the probability 
P(P L (") I speaker j) 



Calculation for the short-term 
parameters p A (i), i = 1..N in 
each frame of the probability 
P(P A (') speaker j) 



Calculation of the probability scores for each signal frame: 
P(Pk('). Pl('), P A (») speaker]) = p K (i)p L (i)p A (i) 



- Combination of the results from all signal frames. 
Calculation of the test statistics WS = Yl PiPxO'l PjW.PzW I speaker j) 

i=1 

5 
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FIG 8g 



Repetition of the 
preparation phase of the speaker 
identification system 

. 

NO 



YES 



v 

No additional training of the probability 
distributions required. The probability 
distributions are stored in the system 
and are ready for the usage phase. 
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FIG 8h 



Speaker identification system usage phase 

(Profile for the speaker j) 



Text-independent system 



Recording of any desired spoken 
expression 



Text-dependent system 



Specific word sequence, a sentence or 
a keyword (for example the name of 
the user). 



Segmentation of the spoken expression into the 
signal frames x(1)...x(N) where N is dependent on 
the total length of the spoken expression. 
x(i)=[x(1)...x(L)] where L is the length of the signal 
frame. 



Calculation of the short-term 
parameters for each signal 
frame p K (i), i = 1..N 
Training set for the short-term | 
parameters is formed for the 
respective speaker: TSK, = 
{p K (i) ) i = 1..N}j=1..M 



Calculation of the long-term 
parameters for each signal 
frame p L (i),i = 1..N 
Training set for the long-term 
parameters is formed for the 
respective speaker: TSLj - 
{p,(i),i = 1..N} 




Calculation of the excitation 
parameters for each signal 
frame p. (i), i = 1..N 
Speech fun da mental -period - 
normalized spectral forms of 
the LPC remaining signal 
Training set for the short-term 
parameters is formed for the 
respective speaker: TSAj = 
{p A (i),i = 1..N} 
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FIG 8i 
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Probability distributions of the speech 
parameters for speaker 1 (in the form 
dependent on the number of system 
users) 



Probability distributions for the 
short-term parameters 



Probability distributions for the long- 
term parameters 



Probability distributions for the 
excitation parameters 



Probability distributions of the speech 
parameters for speaker j (in the form 
dependent on the number of system 
users) 



Probability distributions for the 
short-term parameters 



Probability distributions for the 
long-term parameters 



Probability distributions for the 
excitation parameters 



up to speaker M 
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FIG 8j 



(ia) (2?) up to speaker 



Speaker verification 




Calculation of the probability scores for each signal frame: 
P(Pk('). P l ('). P A (i) Speaker j) = p K (i)p L (i)p A (i) 



Combination of the resultsfrom all signal frames. Calculation 
of the test statistics WS=f]p{j> K (i),p A (Q,p L 0')\ speakerj) 

i=1 
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FIG 8k 



WS > Threshold 



NO 



Rejection 
p. 



YES 



Confirmation of the speaker identity 
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FIG 81 




Calculation for the short-term 
parameters p K (i), i = E..N in each 
frame of the probabilities 
p(p K (i) | speaker m), m-1 ..M 



Results for each 
of the M speakers 



© 



Calculation for the short-term 
parameters p K (i), i - 1..N in each 
frame of the probabilities 
P(P L (i) [speaker m), m=1..M 



Results for each of 
the M speakers 




Calculation for the short-term 
parameters p A {i), i = I..N in 
each frame of the probabilities 
p(p A (i) | speaker m), m=1..M 



Results for each of 
the M speakers 



Calculation of the probability scores for each signal frame: 

P(Pk(')» Pl0)» P A (i) [speaker m) = P K [i)P L (i)P A (i) for each of the M speakers m=1..M 



I 



Combination of theses u Its from all signal frames. Calculation of the test 

statistics ws{m) = f\ PivA'XPA'XPi (0 1 speaker m) for each of the M speaker m=1..M 
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FIG 8m 

±__ 

Definition of the speaker identity. 
Speaker j is chosen for WSQ)>WS(i),j*i 

Speaker identity 



1999P02665WOUS 



Declaration and Power of Attorney For Patent Application 
Erklarung Fur Patentanmeldungen Mit Vollmacht 

German Language Declaration 



Als nachstehend benannter Erfinder erklare ich hiermit As a below named inventor, I hereby declare that* 

an Eides Statt: 



dass mein Wohnsitz, meine Postanschrift, und meine 
Staatsangehorigkeit den im Nachstehenden nach 
meinem Namen aufgefuhrten Angaben entsprechen, 



dass ich, nach bestem Wissen der ursprungliche, erste 
und alleinige Erfinder (falls nachstehend nur ein Name 
angegeben ist) oder ein ursprunglicher, erster und 
Miterfinder (falls nachstehend mehrere Namen 
aufgefuhrt sind) des Gegenstandes bin, fur den dieser 
Antrag gestelit wird und fur den ein Patent beantragt 
wird fur die Erfindung mit dem Titel: 

Verfahren zum Trainieren eines 
Sprechererkennunassvstems 

deren Beschreibung 

(zutreffendes ankreuzen) 

□ hier beigefugt ist. 

lEI am 25.08.2000 als 

PCT internationale Anmeldung 

PCT Anmeldungsnummer PCT/DE00/02917 

eingereicht wurde und am 

abgeandert wurde (falls tatsachlich abgeandert). 



Ich bestatige hiermit, dass ich den Inhalt der obigen 
Patentanmeldung einschliesslich der Anspruche 
durchgesehen und verstanden habe, die eventuell 
durch einen Zusatzantrag wie oben erwahnt abgean- 
dert wurde. 



Ich erkenne meine Pflicht zur Offenbarung irgendwel- 
cher Informationen, die fur die Prufung der vorliegen- 
den Anmeldung in Einklang mit Absatz 37, Bundes- 
gesetzbuch, Paragraph 1.56(a) von Wichtigkeit sind, 
an. 



Ich beanspruche hiermit auslandische Prioritatsvorteile 
gemass Abschnitt 35 der Ziviiprozessordnung der 
Vereinigten Staaten, Paragraph 119 aller unten ange- 
gebenen Auslandsanmeldungen fur ein Patent oder 
eine Erfindersurkunde, und habe auch alle Auslands- 
anmeldungen fur ein Patent oder eine Erfindersurkun- 
de nachstehend gekennzeichnet, die ein Anmelde- 
datum haben, das vor dem Anmeldedatum der 
Anmeldung liegt, fur die Prioritat beansprucht wird. 



My residence, post office address and citizenship are 
as stated below next to my name, 



I believe I am the original, first and sole inventor (if only 
one name is listed below) or an original, first and joint 
inventor (if plural names are listed below) of the 
subject matter which is claimed and for which a patent 
is sought on the invention entitled 



Method for training a speaker recognition 
system 



the specification of which 

(check one) 

□ is attached hereto. 

was filed on 25.08.2000 as 
PCT international application 
PCT Application No. PCT/DE00/02917 

and was amended on 

(if applicable) 



I hereby state that I have reviewed and understand the 
contents of the above identified specification, including 
the claims as amended by any amendment referred to 
above. 



I acknowledge the duty to disclose information which is 
material to the examination of this application in 
accordance with Title 37, Code of Federal Regulations 
§1 -56(a). 



I hereby claim foreign priority benefits under Title 35, 
United States Code, §119 of any foreign application(s) 
for patent or inventor's certificate listed below and have 
also identified below any foreign application for patent 
or inventor's certificate having a filing date before that 
of the application on which priority is claimed: 
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Patent and Trademark Office-U.S. DEPARTMENT OF COMMERCE 





German Language Declaration 


Prior foreign appplications 
Prioritat beansprucht 




Priority Claimed 


19940567.0 DE 
(Number) (Country) 
(Nummer) (Land) 


26.08.1999 M n 
(Day Month Year Filed) Yes No 
(Tag Monat Jahr eingereicht) Ja Nein 


(Number) (Country) 
(Nummer) (Land) 


□ □ 

(Day Month Year Filed) Yes No 
(Tag Monat Jahr eingereicht) Ja Nein 


(Number) (Country) 
(Nummer) (Land) 


□ □ 

(Day Month Year Filed) Yes No 
(Tag Monat Jahr eingereicht) Ja Nein 


Ich beanspruche hiermit gemass Absatz 35 der Zivil- 
prozessordnung der Vereinigten Staaten, Paragraph 
120, den Vorzug aller unten aufgefuhrten Anmel- 
dungen und falls der Gegenstand aus jedem Anspruch 
dieser Anmeldung nicht in einer fruheren 
amerikanischen Patentanmeldung laut dem ersten 
Paragraphen des Absatzes 35 der Zivilprozeftordnung 
der Vereinigten Staaten, Paragraph 122 offenbart ist, 
erkenne ich gemass Absatz 37, Bundesgesetzbuch, 
Paragraph 1.56(a) meine Pflicht zur Offenbarung von 
Informationen an, die zwischen dem Anmeldedatum 
der fruheren Anmeldung und dem nationalen oder PCT 
internationaien Anmeldedatum dieser Anmeldung 
bekannt geworden sind. 


I hereby claim the benefit under Title 35. United States 
Code. §120 of any United States application (s) listed 
below and, insofar as the subject matter of each of the 
claims of this application is not disclosed in the prior 
United States application in the manner provided by 
the first paragraph of Title 35, United States Code, 
§122, I acknowledge the duty to disclose material 
information as defined in Title 37, Code of Federal 
Regulations, §1 .56(a) which occured between the filing 
date of the prior application and the national or PCT 
international filing date of this application. 


PCT/DE00/02917 


25.08.2000 




(Application Serial No.) 
(An mel dese rien n u m mer) 


(Filing Date D, M, Y) 
(Anmeldedatum T, M, J) 


(Status) (Status) 
(patentiert, anhangig, (patented, pending, 
aufgegeben) abandoned) 


{Application Serial No.) 
(An m e Id eserie n n u m me r) 


(Filing Date D,M,Y) 
{Anmeldedatum T, M; J) 


(Status) (Status) 
(patentiert, anhangig, (patented, pending, 
aufgeben) abandoned) 


Ich erklare hiermit, dass alle von mir in der vorliegen- 
den Erklarung gemachten Angaben nach meinem 
besten Wissen und Gewissen der vollen Wahrheit 
entsprechen, und dass ich diese eidesstattliche Erkla- 
rung in Kenntnis dessen abgebe, dass wissentlich und 
vorsatzlich falsche Angaben gemass Paragraph 1001, 
Absatz 18 der Zivilprozessordnung der Vereinigten 
Staaten von Amerika mit Geldstrafe belegt und/oder 
Gefangnis bestraft werden koennen, und dass derartig 
wissentlich und vorsatzlich falsche Angaben die Gul- 
tigkeit der vorliegenden Patentanmeldung Oder eines 
darauf erteilten Patentes gefahrden konnen. 


I hereby declare that all statements made herein of my 
own knowledge are true and that all statements made 
on information and belief are believed to be true, and 
further that these statements were made with the 
knowledge that willful false statements and the like so 
made are punishable by fine or imprisonment, or both, 
under Section 1001 of Title 18 of the United States 
Code and that such willful false statements may 
jeopardize the validity of the application or any patent 
issued thereon. 
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Patent and Trademark Office-U.S. DEPARTMENT OF COMMERCE 



German Language Declaration 



VEBTRETUMGSVOLLMACKT: Als benannter Erfinder 
billiftrage idi hiermit den nachstehend benannten 
Patentanwalt (oder die nachstehend benannten 
Patentanwalte) und/oder Patent-Agenten mit der 
Verfolgung der vorliegenden Patentanmeldung sowie 
mit der Abwickiung aller damit verbundenen Geschafte 
vor dem Patent- und Warenzeichenamt: (Name und 
Registrationsnummer anfuhren) 



POWER OF ATTORNEY: As a named inventor, I 
hereby appoint the following attorney(s) and/or 
agent(s) to prosecute this application and transact all 
business in the Patent and Trademark Office 
connected therewith, (list name and registration 
number) 



And I hereby appoint 



Customer No. 



Telefongesprache bitte richten an: 
(Name und Telefonnummer) 



Direct Telephone Calls to: (name and telephone 
number) 

Ext. 



Postanschrift: Send Correspondence to: 

Bell, Boyd & Lloyd LLC 
70 West Madison Street, Suite 3300 60602-4207 Chicago, Illinois 
Telephone: +1 312 372 1121 and Facsimile +1 312 372 2098 

or 

Customer No. 



Voiler Named^ejijzr^Ui^r ursprunglichen Erfinders. 

MARCrN KUROPATWILISKI 


Full name of sole or first inventor: 

MARCIN KUROPATWINSKI 


Onferschrift des Erfinders Datum 

^^^j>^ 13a Mai 01 


Inventor's signature 


Date 


Wohnsitz ^ 

P-Gdansk, POLEN 


Residence 

P-Gdansk, POLAND 


Staatsangehbrigkeit 

PL 


Citizenship 

PL 


Postanschrift 

c/o. ul. Danusi 1/3 / 


Post Office Addess 

c/o. ul. Danusi 1/3 


80434 P-Gdansk /J j^J^ 


80434 P-Gdansk 


Voller Name des zweiten Miterfinders (falls zutreffend): 


Full name of second joint inventor, if any: 


Unterschrift des Erfinders Datum 


Second Inventor's signature 


Date 


Wohnsitz 


Residence 


Staatsangehorigkeit 


Citizenship 



Postanschrift 



Post Office Address 



(Bitte entsprechende Informationen und Unterschriften im (Supply similar information and signature for third and 
Falle von dritten und weiteren Miterfindern angeben). subsequent joint inventors). 
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